Loading COVID-19 Data: Vaccinations and Deaths¶

This block of code loads two datasets: COVID-19 vaccination data and COVID-19 deaths data for the United States.

In [1]:
import pandas as pd

# Load the vaccination data
vaccinations_path = 'us_state_vaccinations.csv'
vaccinations_data = pd.read_csv(vaccinations_path)

# Load the deaths data
deaths_path = 'time_series_covid19_deaths_US.csv'
deaths_data = pd.read_csv(deaths_path)

# Display the first few rows of each dataset
vaccinations_data.head(), deaths_data.head()
Out[1]:
(         date location  total_vaccinations  total_distributed  \
 0  2021-01-12  Alabama             78134.0           377025.0   
 1  2021-01-13  Alabama             84040.0           378975.0   
 2  2021-01-14  Alabama             92300.0           435350.0   
 3  2021-01-15  Alabama            100567.0           444650.0   
 4  2021-01-16  Alabama                 NaN                NaN   
 
    people_vaccinated  people_fully_vaccinated_per_hundred  \
 0            70861.0                                 0.15   
 1            74792.0                                 0.19   
 2            80480.0                                  NaN   
 3            86956.0                                 0.28   
 4                NaN                                  NaN   
 
    total_vaccinations_per_hundred  people_fully_vaccinated  \
 0                            1.59                   7270.0   
 1                            1.71                   9245.0   
 2                            1.88                      NaN   
 3                            2.05                  13488.0   
 4                             NaN                      NaN   
 
    people_vaccinated_per_hundred  distributed_per_hundred  \
 0                           1.45                     7.69   
 1                           1.53                     7.73   
 2                           1.64                     8.88   
 3                           1.77                     9.07   
 4                            NaN                      NaN   
 
    daily_vaccinations_raw  daily_vaccinations  daily_vaccinations_per_million  \
 0                     NaN                 NaN                             NaN   
 1                  5906.0              5906.0                          1205.0   
 2                  8260.0              7083.0                          1445.0   
 3                  8267.0              7478.0                          1525.0   
 4                     NaN              7498.0                          1529.0   
 
    share_doses_used  total_boosters  total_boosters_per_hundred  
 0             0.207             NaN                         NaN  
 1             0.222             NaN                         NaN  
 2             0.212             NaN                         NaN  
 3             0.226             NaN                         NaN  
 4               NaN             NaN                         NaN  ,
         UID iso2 iso3  code3    FIPS   Admin2 Province_State Country_Region  \
 0  84001001   US  USA    840  1001.0  Autauga        Alabama             US   
 1  84001003   US  USA    840  1003.0  Baldwin        Alabama             US   
 2  84001005   US  USA    840  1005.0  Barbour        Alabama             US   
 3  84001007   US  USA    840  1007.0     Bibb        Alabama             US   
 4  84001009   US  USA    840  1009.0   Blount        Alabama             US   
 
          Lat      Long_  ... 2/28/23  3/1/23  3/2/23  3/3/23  3/4/23  3/5/23  \
 0  32.539527 -86.644082  ...     230     232     232     232     232     232   
 1  30.727750 -87.722071  ...     724     726     726     726     726     726   
 2  31.868263 -85.387129  ...     103     103     103     103     103     103   
 3  32.996421 -87.125115  ...     109     109     109     109     109     109   
 4  33.982109 -86.567906  ...     261     261     261     261     261     261   
 
    3/6/23  3/7/23  3/8/23  3/9/23  
 0     232     232     232     232  
 1     726     726     727     727  
 2     103     103     103     103  
 3     109     109     109     109  
 4     261     261     261     261  
 
 [5 rows x 1155 columns])

Data Preparation: Converting Dates and Aggregating Deaths by State and Date¶

This data preparation process ensures that the date columns are in a consistent format and facilitates the analysis of COVID-19 deaths over time and across different states.

In [2]:
# Convert date columns to datetime for both datasets
vaccinations_data['date'] = pd.to_datetime(vaccinations_data['date'])

# For the deaths data, extract dates from the columns and melt the dataframe
date_columns = deaths_data.columns[12:]  # skipping non-date columns
deaths_data_melted = deaths_data.melt(id_vars=['Province_State'], value_vars=date_columns,
                                      var_name='date', value_name='deaths')
deaths_data_melted['date'] = pd.to_datetime(deaths_data_melted['date'])

# Aggregate deaths by state and date
deaths_by_state_date = deaths_data_melted.groupby(['Province_State', 'date']).sum().reset_index()

# Inspect the processed data for deaths
deaths_by_state_date.head()
C:\Users\varsh\AppData\Local\Temp\ipykernel_3264\834874680.py:8: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  deaths_data_melted['date'] = pd.to_datetime(deaths_data_melted['date'])
Out[2]:
Province_State date deaths
0 Alabama 2020-01-22 0
1 Alabama 2020-01-23 0
2 Alabama 2020-01-24 0
3 Alabama 2020-01-25 0
4 Alabama 2020-01-26 0

Date Range Comparison between Vaccination and Death Data¶

The code extracts the minimum and maximum dates from each dataset to determine the range of dates covered. By comparing these date ranges, it allows for assessing whether the datasets cover the same time period or if there are discrepancies. This information is crucial for ensuring data consistency and for conducting further analysis, such as correlating vaccination rates with mortality rates over the same time frame.

In [3]:
# Check the date ranges for both datasets
vaccination_date_range = (vaccinations_data['date'].min(), vaccinations_data['date'].max())
death_date_range = (deaths_by_state_date['date'].min(), deaths_by_state_date['date'].max())

vaccination_date_range, death_date_range
Out[3]:
((Timestamp('2020-12-20 00:00:00'), Timestamp('2023-05-10 00:00:00')),
 (Timestamp('2020-01-22 00:00:00'), Timestamp('2023-03-09 00:00:00')))

Filtering and Merging COVID-19 Data¶

Filtering and merging COVID-19 vaccination and deaths data for overlapping date range

In [4]:
# Filter both datasets to the overlapping date range
start_date = pd.Timestamp('2020-12-20')
end_date = pd.Timestamp('2023-03-09')

filtered_vaccinations = vaccinations_data[(vaccinations_data['date'] >= start_date) & (vaccinations_data['date'] <= end_date)]
filtered_deaths = deaths_by_state_date[(deaths_by_state_date['date'] >= start_date) & (deaths_by_state_date['date'] <= end_date)]

# Merge the datasets on state and date
merged_data = pd.merge(filtered_vaccinations, filtered_deaths, left_on=['location', 'date'], right_on=['Province_State', 'date'], how='inner')

# Renaming some columns for clarity and inspecting the merged dataset
merged_data = merged_data.rename(columns={'location': 'state', 'people_vaccinated': 'total_people_vaccinated'})
merged_data.head()
Out[4]:
date state total_vaccinations total_distributed total_people_vaccinated people_fully_vaccinated_per_hundred total_vaccinations_per_hundred people_fully_vaccinated people_vaccinated_per_hundred distributed_per_hundred daily_vaccinations_raw daily_vaccinations daily_vaccinations_per_million share_doses_used total_boosters total_boosters_per_hundred Province_State deaths
0 2021-01-12 Alabama 78134.0 377025.0 70861.0 0.15 1.59 7270.0 1.45 7.69 NaN NaN NaN 0.207 NaN NaN Alabama 5573
1 2021-01-13 Alabama 84040.0 378975.0 74792.0 0.19 1.71 9245.0 1.53 7.73 5906.0 5906.0 1205.0 0.222 NaN NaN Alabama 5760
2 2021-01-14 Alabama 92300.0 435350.0 80480.0 NaN 1.88 NaN 1.64 8.88 8260.0 7083.0 1445.0 0.212 NaN NaN Alabama 5945
3 2021-01-15 Alabama 100567.0 444650.0 86956.0 0.28 2.05 13488.0 1.77 9.07 8267.0 7478.0 1525.0 0.226 NaN NaN Alabama 6030
4 2021-01-16 Alabama NaN NaN NaN NaN NaN NaN NaN NaN NaN 7498.0 1529.0 NaN NaN NaN Alabama 6119
In [5]:
import plotly.io as pio
pio.renderers.default = 'notebook'

COVID-19 Vaccinations and Deaths Heatmaps¶

COVID-19 Vaccinations and Deaths Heatmaps: This code generates heatmaps using Plotly to visualize COVID-19 vaccinations and deaths data aggregated by month and state, providing insights into the distribution of vaccinations and deaths across different states over time.

In [6]:
import pandas as pd
import plotly.graph_objects as go

# Assuming 'merged_data' is your DataFrame and it's already properly loaded and aggregated

# Convert date column to datetime and extract month or appropriate time period
merged_data['date'] = pd.to_datetime(merged_data['date'])
merged_data['month'] = merged_data['date'].dt.strftime('%Y-%m')

# Pivot the data for vaccinations heatmap
vaccinations_pivot = merged_data.pivot_table(values='total_people_vaccinated', 
                                             index='state', 
                                             columns='month', 
                                             aggfunc='max',  # Assuming you want the max for the month
                                             fill_value=0)

# Pivot the data for deaths heatmap
deaths_pivot = merged_data.pivot_table(values='deaths', 
                                       index='state', 
                                       columns='month', 
                                       aggfunc='sum',  # Assuming you want the sum for the month
                                       fill_value=0)

# Vaccinations Heatmap
fig_vaccinations = go.Figure(data=go.Heatmap(
                    z=vaccinations_pivot.values,
                    x=vaccinations_pivot.columns,
                    y=vaccinations_pivot.index,
                    colorscale='Viridis'))

fig_vaccinations.update_layout(
    title='COVID-19 Vaccinations Heatmap',
    xaxis_nticks=36,
    yaxis_nticks=36)

# Deaths Heatmap
fig_deaths = go.Figure(data=go.Heatmap(
                    z=deaths_pivot.values,
                    x=deaths_pivot.columns,
                    y=deaths_pivot.index,
                    colorscale='Viridis'))

fig_deaths.update_layout(
    title='COVID-19 Deaths Heatmap',
    xaxis_nticks=36,
    yaxis_nticks=36)

# Show the figures
fig_vaccinations.show()
fig_deaths.show()

This code creates two Plotly heatmaps to visualize the impact of COVID-19 vaccinations on 'merged_data'. By pivoting the data to separate vaccinations and deaths, it illustrates the total number of people vaccinated and COVID-19-related deaths over time for each state. These visualizations provide insights into the effectiveness of vaccination efforts in reducing fatalities across states.

In [ ]: